Spectral clustering and the high-dimensional Stochastic Block Model

نویسندگان

  • Karl Rohe
  • Sourav Chatterjee
  • Bin Yu
چکیده

Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the structure of several empirical networks. Spectral clustering is a popular and computationally feasible method to discover these communities. The Stochastic Block Model (Holland et al., 1983) is a social network model with well defined communities; each node is a member of one community. For a network generated from the Stochastic Block Model, we bound the number of nodes “misclustered” by spectral clustering. The asymptotic results in this paper are the first clustering results that allow the number of clusters in the model to grow with the number of nodes, hence the name high-dimensional. In order to study spectral clustering under the Stochastic Block Model, we first show that under the more general latent space model, the eigenvectors of the normalized graph Laplacian asymptotically converge to the eigenvectors of a “population” normalized graph Laplacian. Aside from the implication for spectral clustering, this provides insight into a graph visualization technique. Our method of studying the eigenvectors of random matrices is original. AMS 2000 subject classifications: Primary 62H30, 62H25; secondary 60B20.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spectral Clustering of graphs with the Bethe Hessian

Spectral clustering is a standard approach to label nodes on a graph by studying the (largest or lowest) eigenvalues of a symmetric real matrix such as e.g. the adjacency or the Laplacian. Recently, it has been argued that using instead a more complicated, non-symmetric and higher dimensional operator, related to the non-backtracking walk on the graph, leads to improved performance in detecting...

متن کامل

Spectral Clustering and Community Detection in Labeled Graphs

We study spectral clustering techniques to learn community structures in labeled random graphs where edge labels from a label set L = {1, ..., L} are drawn according to discrete probability distributions parametrized by community membership of the two end-nodes of the edge. This is a strict generalization of the standard stochastic block model for community detection.

متن کامل

Consistent parameter estimation in general stochastic block models with overlaps

This paper considers the parameter estimation problem in Stochastic Block Model with Overlaps (SBMO), which is a quite general instance of random graph model allowing for overlapping community structure. We present the new algorithm successive projection overlapping clustering (SPOC) which combines the ideas of spectral clustering and geometric approach for separable non-negative matrix factori...

متن کامل

Community Detection with the Non-Backtracking Operator

Community detection consists in identification of groups of similar items within a population. In the context of online social networks, it is a useful primitive for recommending either contacts or news items to users. We will consider a particular generative probabilistic model for the observations, namely the so-called stochastic block model and prove that the non-backtracking operator provid...

متن کامل

Approximation solution of two-dimensional linear stochastic Volterra-Fredholm integral equation via two-dimensional Block-pulse ‎functions

In this paper, a numerical efficient method based on two-dimensional block-pulse functions (BPFs) is proposed to approximate a solution of the two-dimensional linear stochastic Volterra-Fredholm integral equation. Finally the accuracy of this method will be shown by an example.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010